基于Yes/No反馈的视觉问答方法

doi:10.16451/j.cnki.issn1003-6059.202011009

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (995 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对视觉问答任务中问题语句可能存在的歧义,文中提出基于Yes/No反馈的视觉问答方法,通过Yes/No的反馈机制判断模型第一次得出答案的正误.当用户给出的反馈信息为No时,重新解析该问题,生成多种消歧后的问题,产生不同的候选答案,输出最高置信度的答案作为最终结果.在CLEVR、CLEVR-CoGenT基准数据集上的实验表明文中方法精度较高.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	邓硙
	汪剑鸣
	金光浩

关键词 ：视觉问答, 计算机视觉, 自然语言处理, 句法消歧, 反馈

Abstract：Aiming at the ambiguous question sentence in the visual question answering task,a visual question answering method based on Yes/No feedback is proposed.The Yes/No feedback mechanism is employed to determine whether or not the answer is correct for the first time.When the feedback given by the user is no,the question is re-analyzed,new questions are generated after disambiguation and different candidate answers are generated.The answer with the highest confidence is output as the final result.The experimental results on ClEVR,CLEVR-CoGen benchmark datasets show the proposed method achieves higher accuracy than the existing methods.

Key words： Visual Question Answering Computer Vision Natural Language Processing Syntactic Disambiguation Feedback

收稿日期: 2020-03-18

ZTFLH:

P315.69

基金资助:国家自然科学基金项目(No.61373104)、天津市高等学校基本科研业务费项目(No.2019KJ019)资助

通讯作者: 金光浩,博士,讲师,主要研究方向为计算机视觉、人工智能、深度学习、异构/重构计算.E-mail:jingh_research@163.com.

作者简介: 邓硙,硕士研究生,主要研究方向为视觉问答、计算机视觉、自然语言处理.E-mail:dengwei940517@163.com.汪剑鸣,博士,教授,主要研究方向为信号处理、机器学习、智能控制技术.E-mail:wangjianming@tjpu.edu.cn.

引用本文:

邓硙, 汪剑鸣, 金光浩. 基于Yes/No反馈的视觉问答方法[J]. 模式识别与人工智能, 2020, 33(11): 1043-1053. DENG Wei, WANG Jianming, JIN Guanghao. Visual Question Answering Method Based on Yes/No Feedback. , 2020, 33(11): 1043-1053.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202011009 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2020/V33/I11/1043

[1] 王永琦,吴飞,王春媛,等.新的动态记忆网络的视觉问答[J/OL].[2020-07-07].https://doi.org/10.19734/j.issn.1001-3695.2019.05.0212.
(WANG Y Q,WU F,WANG C Y, et al.New Dynamic Memory Network for Visual Question Answering[J/OL].[2020-07-07].https://doi.org/10.19734/j.issn.1001-3695.2019.05.0212.)
[2] 俞俊,汪亮,余宙.视觉问答技术研究.计算机研究与发展,2018,55(9):1946-1958.
(YU J,WANG L,YU Z.Research on Visual Question Answering Techniques.Journal of Computer Research and Development,2018,55(9):1946-1958.)
[3] 孟祥申,江爱文,刘长红,等.基于Spatial-DCTHash动态参数网络的视觉问答算法.中国科学(信息科学),2017,47(8):1008-1022.
(MENG X S,JIANG A W,LIU C H,et al.Visual Question Answering Based on Spatial DCTHash Dynamic Parameter Network.Scientia Sinica(Informationis),2017,47(8):1008-1022.)
[4] CADENE R,BEN-YOUNES H,CORD M,et al. MUREL:Multimodal Relational Reasoning for Visual Question Answering//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington,USA:IEEE,2019:1989-1998.
[5] PATRO B,NAMBOODIRI V P.Differential Attention for Visual Question Answering//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington,USA:IEEE,2018:7680-7688.
[6] FUKUI A,PARK D H,YANG D,et al.Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding//Proc of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,USA:ACL,2016:457-468.
[7] ANDREAS J,ROHRBACH M,BARRELL T,et al.Neural Module Networks//Proc of the IEEE Conference on Computer Vision and Pattern Recognition.Washington,USA:IEEE,2016:39-48.
[8] JOHNSON J,HARIHARAN B,VAN DER MAATEN L,et al.Inferring and Executing Programs for Visual Reasoning//Proc of the IEEE International Conference on Computer Vision.Washington,USA:IEEE,2017:3008-3017.
[9] SHRESTHA R,KAFLE K,KANAN C.Answer Them All! Toward Universal Visual Question Answering Models//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington,USA:IEEE,2019:10472-10481.
[10] HU R H,ANDREAS J,ROHRBACH M,et al.Learning to Reason:End-to-End Module Networks for Visual Question Answering//Proc of the IEEE International Conference on Computer Vision.Washington,USA:IEEE,2017:804-813.
[11] LI Y,ZHAO B,FUXMAN A,et al.Guess Me if You Can:Acronym Disambiguation for Enterprises//Proc of the 56th Annual Meeting of the Association for Computational Linguistics(Long Papers).Stroudsburg,USA:ACL,2018:1308-1317.
[12] GONG H Y,MU J Q,BHAT S,et al. Preposition Sense Disambiguation and Representation//Proc of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,USA:ACL,2018:1510-1521.
[13] SHAHROUR A,KHALIFA S,TAJI D,et al. CamelParser:A System for Arabic Syntactic Analysis and Morphological Disambiguation//Proc of the 26th International Conference on Computational Linguistics(System Demonstrations).Stroudsburg,USA:ACL,2016:228-232.
[14] MORE A,TSARFATY R R.Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies//Proc of the 26th International Conference on Computational Linguistics(Technical Papers).Stroudsburg,USA:ACL,2016:337-348.
[15] WILLIAMS R J.Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning.Machine Learning,1992,8:229-256.
[16] JOHNSON J,HARIHARAN B,VAN DER MAATEN L,et al.
CLEVR:A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning//Proc of the IEEE Conference on Computer Vision and Pattern Recognition.Washington,USA:IEEE,2017:2901-2910.
[17] HAURILET M,ROITBERG A,STIEFELHAGEN R.It′s Not about the Journey;It′s about the Destination:Following Soft Paths under Question-Guidance for Visual Reasoning//Proc of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington,USA:IEEE,2019:1930-1939.
[18] PEREZ E,DE VRIES H,STRUB F,et al. Learning Visual Reasoning without Strong Priors[C/OL].[2020-07-07].https://arxiv.org/pdf/1707.03017.pdf.
[19] YAO Y Q,XU J M,WAN G F, et al. Cascaded Mutual Modulation for Visual Reasoning//Proc of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,USA:ACL,2018:975-980.
[20] HUDSON D A,MANNING C D.Compositional Attention Networks for Machine Reasoning[C/OL].[2020-07-07].https://arxiv.org/pdf/1803.03067.pdf.
[21] MASCHARKA D,TRAN P,SOKLASKI R,et al.Transparency by Design:Closing the Gap between Performance and Interpre-tability in Visual Reasoning//Proc of the IEEE Conference on Computer Vision and Pattern Recognition.Washington,USA:IEEE,2018:4942-4950.